Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as the face recognition or face detection, is unfortunately paid little attention to in literature for detecting the manipulated face images. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information to tackle the problem of face manipulation detection in real world applications. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from a RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.
translated by 谷歌翻译
隐肌通常会将覆盖媒体修改为嵌入秘密数据。最近出现了一种称为生成隐志(GS)的新型隐志方法,其中直接从秘密数据中生成了Stego图像(包含秘密数据的图像),而无需覆盖媒体。但是,现有的GS方案经常因其表现不佳而受到批评。在本文中,我们提出了一个先进的生成隐志网络(GSN),该网络可以在不使用封面图像的情况下生成逼真的Stego图像,其中首先在Stego Image生成中引入了相互信息。我们的模型包含四个子网络,即图像生成器($ g $),一个歧视器($ d $),steganalyzer($ s $)和数据提取器($ e $)。 $ d $和$ s $充当两个对抗歧视器,以确保生成的Stego图像的视觉和统计不可识别。 $ e $是从生成的Stego图像中提取隐藏的秘密。发电机$ g $灵活地构建以合成具有不同输入的封面或seego图像。它通过隐藏在普通图像发生器中生成seego图像的功能来促进秘密通信。一个名为Secret Block的模块设计用于在图像生成过程中掩盖特征地图中的秘密数据,并实现了高隐藏容量和图像保真度。此外,开发了一种新型的层次梯度衰减技能来抵抗切割分析的检测。实验证明了我们工作比现有方法的优越性。
translated by 谷歌翻译
在线社交网络比以往任何时候都更加激发了互联网的通信,这使得在此类嘈杂渠道上传输秘密消息是可能的。在本文中,我们提出了一个名为CIS-NET的无封面图像隐志网络,该网络合成了直接在秘密消息上传输的高质量图像。 CIS-NET由四个模块组成,即生成,对抗,提取和噪声模块。接收器可以提取隐藏的消息而不会损失任何损失,即使图像已被JPEG压缩攻击扭曲。为了掩盖隐肌的行为,我们在个人资料照片和贴纸的背景下收集了图像,并相应地训练了我们的网络。因此,生成的图像更倾向于摆脱恶意检测和攻击。与先前的图像隐志方法相比,区分主要是针对各种攻击的鲁棒性和无损性。各种公共数据集的实验已经表现出抗坚果分析的卓越能力。
translated by 谷歌翻译
视频容易篡改攻击,从而改变含义并欺骗观众。以前的视频伪造检测方案找到了微小的线索来定位篡改区域。但是,攻击者可以通过使用视频压缩或模糊破坏此类线索来成功逃避监督。本文提出了一个视频水印网络,用于篡改本地化。我们共同训练一个基于3D-UNET的水印嵌入网络和一个预测篡改面罩的解码器。水印嵌入产生的扰动几乎是无法察觉的。考虑到没有现成的可区分的视频编解码器模拟器,我们建议通过结合其他典型攻击的模拟结果来模仿视频压缩,例如JPEG压缩和模糊,作为近似值。实验结果表明,我们的方法生成具有良好不可识别的水印视频,并且在攻击版本中可以稳健,准确地定位篡改区域。
translated by 谷歌翻译
互联网技术的发展不断增强谣言和虚假新闻的传播和破坏力。先前关于多媒体假新闻检测的研究包括一系列复杂的功能提取和融合网络,以实现图像和文本之间的特征对齐。但是,多模式功能由什么组成,以及来自不同模式的特征如何影响决策过程仍然是开放的问题。我们介绍了Aura,这是一个具有自适应单峰表示聚合的多模式假新闻检测网络。我们首先从图像模式,图像语义和文本中分别提取表示形式,并通过将语义和语言表示形式发送到专家网络来生成多模式表示。然后,我们根据单峰和多模式表示,进行粗级的虚假新闻检测和跨模式宇宙性学习。分类和一致性得分被映射到模态感知的注意分数,以重新调整功能。最后,我们汇总并将加权功能分类用于精制的假新闻检测。关于微博和八卦的综合实验证明,Aura可以成功击败几个最先进的FND方案,在该方案中,整体预测准确性和对假新闻的回忆得到稳步改善。
translated by 谷歌翻译
图像裁剪是一种廉价而有效的恶意改变图像内容的操作。现有的裁剪检测机制分析了图像裁剪的基本痕迹,例如色差和渐晕,以发现种植攻击。但是,它们在常见的后处理攻击方面脆弱,通过删除此类提示,欺骗取证。此外,他们忽略了这样一个事实,即恢复裁剪的内容可以揭示出行为造成攻击的目的。本文提出了一种新型的强大水印方案,用于图像裁剪定位和恢复(CLR-NET)。我们首先通过引入不可察觉的扰动来保护原始图像。然后,模拟典型的图像后处理攻击以侵蚀受保护的图像。在收件人方面,我们预测裁剪面膜并恢复原始图像。我们提出了两个即插即用网络,以改善CLR-NET的现实鲁棒性,即细粒生成性JPEG模拟器(FG-JPEG)和Siamese图像预处理网络。据我们所知,我们是第一个解决图像裁剪本地化和整个图像从片段中恢复的综合挑战的人。实验表明,尽管存在各种类型的图像处理攻击,但CLR-NET可以准确地定位裁剪,并以高质量和忠诚度恢复裁剪区域的细节。
translated by 谷歌翻译
深度学习在各种工业应用中取得了巨大成功。公司不希望他们的宝贵数据被恶意员工偷来培训盗版模式。他们也不希望竞争对手在线使用后分析的数据。我们提出了一种新的解决方案,在这种情况下,通过稳健地并可逆地将图像转换为对手图像。我们开发一个可逆的对抗性示例生成器(Raeg),对图像引入略微变化以欺骗传统的分类模型。尽管恶意攻击培训基于Deacened版本的受保护图像的盗版模型,但Raeg可以显着削弱这些模型的功能。同时,Raeg的可逆性确保了授权模型的表现。广泛的实验表明,Raeg可以通过比以前的方法更好地防止对抗对抗防御的轻微扭曲。
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译
Video-language pre-training has advanced the performance of various downstream video-language tasks. However, most previous methods directly inherit or adapt typical image-language pre-training paradigms to video-language pre-training, thus not fully exploiting the unique characteristic of video, i.e., temporal. In this paper, we propose a Hierarchical Temporal-Aware video-language pre-training framework, HiTeA, with two novel pre-training tasks for modeling cross-modal alignment between moments and texts as well as the temporal relations of video-text pairs. Specifically, we propose a cross-modal moment exploration task to explore moments in videos, which results in detailed video moment representation. Besides, the inherent temporal relations are captured by aligning video-text pairs as a whole in different time resolutions with multi-modal temporal relation exploration task. Furthermore, we introduce the shuffling test to evaluate the temporal reliance of datasets and video-language pre-training models. We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e.g., SSv2-Template and SSv2-Label) with 8.6% and 11.1% improvement respectively. HiTeA also demonstrates strong generalization ability when directly transferred to downstream tasks in a zero-shot manner. Models and demo will be available on ModelScope.
translated by 谷歌翻译
Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译